195 research outputs found
CMInject:Python framework for the numerical simulation of nanoparticle injection pipelines
CMInject simulates nanoparticle injection experiments of particles with diameters in the micrometer to nanometer-regime, e.g., for single-particle-imaging experiments. Particle-particle interactions and particle-induced changes in the surrounding fields are disregarded, due to low nanoparticle concentration in these experiments. CMInject's focus lies on the correct modeling of different forces on such particles, such as fluid-dynamics or light-induced interactions, to allow for simulations that further the scientific development of nanoparticle injection pipelines. To provide a usable basis for this framework and allow for a variety of experiments to be simulated, we implemented first specific force models: fluid drag forces, Brownian motion, and photophoretic forces. For verification, we benchmarked a drag-force-based simulation against a nanoparticle focusing experiment. We envision its use and further development by experimentalists, theorists, and software developers. Program summary: Program Title: CMInject CPC Library link to program files: https://doi.org/10.17632/rbpgn4fk3z.1 Developer's repository link: https://github.com/cfel-cmi/cminject Code Ocean capsule: https://codeocean.com/capsule/5146104 Licensing provisions: GPLv3 Programming language: Python 3 Supplementary material: Code to reproduce and analyze simulation results, example input and output data, video files of trajectory movies Nature of problem: Well-defined, reproducible, and interchangeable simulation setups of experimental injection pipelines for biological and artificial nanoparticles, in particular such pipelines that aim to advance the field of single-particle imaging. Solution method: The definition and implementation of an extensible Python 3 framework to model and execute such simulation setups based on object-oriented software design, making use of parallelization facilities and modern numerical integration routines. Additional comments including restrictions and unusual features: Supplementary executable scripts for quantitative and visual analyses of result data are also part of the framework
DiffPhase: Generative Diffusion-based STFT Phase Retrieval
Diffusion probabilistic models have been recently used in a variety of tasks,
including speech enhancement and synthesis. As a generative approach, diffusion
models have been shown to be especially suitable for imputation problems, where
missing data is generated based on existing data. Phase retrieval is inherently
an imputation problem, where phase information has to be generated based on the
given magnitude. In this work we build upon previous work in the speech domain,
adapting a speech enhancement diffusion model specifically for STFT phase
retrieval. Evaluation using speech quality and intelligibility metrics shows
the diffusion approach is well-suited to the phase retrieval task, with
performance surpassing both classical and modern methods.Comment: Submitted to ICASSP 202
DriftRec: Adapting diffusion models to blind JPEG restoration
In this work, we utilize the high-fidelity generation abilities of diffusion
models to solve blind JPEG restoration at high compression levels. We propose
an elegant modification of the forward stochastic differential equation of
diffusion models to adapt them to this restoration task and name our method
DriftRec. Comparing DriftRec against an regression baseline with the same
network architecture and two state-of-the-art techniques for JPEG restoration,
we show that our approach can escape the tendency of other methods to generate
blurry images, and recovers the distribution of clean images significantly more
faithfully. For this, only a dataset of clean/corrupted image pairs and no
knowledge about the corruption operation is required, enabling wider
applicability to other restoration tasks. In contrast to other conditional and
unconditional diffusion models, we utilize the idea that the distributions of
clean and corrupted images are much closer to each other than each is to the
usual Gaussian prior of the reverse process in diffusion models. Our approach
therefore requires only low levels of added noise, and needs comparatively few
sampling steps even without further optimizations. We show that DriftRec
naturally generalizes to realistic and difficult scenarios such as unaligned
double JPEG compression and blind restoration of JPEGs found online, without
having encountered such examples during training.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement
Recently, score-based generative models have been successfully employed for
the task of speech enhancement. A stochastic differential equation is used to
model the iterative forward process, where at each step environmental noise and
white Gaussian noise are added to the clean speech signal. While in limit the
mean of the forward process ends at the noisy mixture, in practice it stops
earlier and thus only at an approximation of the noisy mixture. This results in
a discrepancy between the terminating distribution of the forward process and
the prior used for solving the reverse process at inference. In this paper, we
address this discrepancy and propose a forward process based on a Brownian
bridge. We show that such a process leads to a reduction of the mismatch
compared to previous diffusion processes. More importantly, we show that our
approach improves in objective metrics over the baseline process with only half
of the iteration steps and having one hyperparameter less to tune.Comment: 5 pages, 2 figures, Accepted to Interspeech 2022
A Flexible Online Framework for Projection-Based STFT Phase Retrieval
Several recent contributions in the field of iterative STFT phase retrieval
have demonstrated that the performance of the classical Griffin-Lim method can
be considerably improved upon. By using the same projection operators as
Griffin-Lim, but combining them in innovative ways, these approaches achieve
better results in terms of both reconstruction quality and required number of
iterations, while retaining a similar computational complexity per iteration.
However, like Griffin-Lim, these algorithms operate in an offline manner and
thus require an entire spectrogram as input, which is an unrealistic
requirement for many real-world speech communication applications. We propose
to extend RTISI -- an existing online (frame-by-frame) variant of the
Griffin-Lim algorithm -- into a flexible framework that enables straightforward
online implementation of any algorithm based on iterative projections. We
further employ this framework to implement online variants of the fast
Griffin-Lim algorithm, the accelerated Griffin-Lim algorithm, and two
algorithms from the optics domain. Evaluation results on speech signals show
that, similarly to the offline case, these algorithms can achieve a
considerable performance gain compared to RTISI.Comment: Submitted to ICASSP 2
Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration
Diffusion-based generative models have had a high impact on the computer
vision and speech processing communities these past years. Besides data
generation tasks, they have also been employed for data restoration tasks like
speech enhancement and dereverberation. While discriminative models have
traditionally been argued to be more powerful e.g. for speech enhancement,
generative diffusion approaches have recently been shown to narrow this
performance gap considerably. In this paper, we systematically compare the
performance of generative diffusion models and discriminative approaches on
different speech restoration tasks. For this, we extend our prior contributions
on diffusion-based speech enhancement in the complex time-frequency domain to
the task of bandwith extension. We then compare it to a discriminatively
trained neural network with the same network architecture on three restoration
tasks, namely speech denoising, dereverberation and bandwidth extension. We
observe that the generative approach performs globally better than its
discriminative counterpart on all tasks, with the strongest benefit for
non-additive distortion models, like in dereverberation and bandwidth
extension. Code and audio examples can be found online at
https://uhh.de/inf-sp-sgmsemultitaskComment: Submitted to ICASSP 202
Speech Enhancement and Dereverberation with Diffusion-based Generative Models
In this work, we build upon our previous publication and use diffusion-based
generative models for speech enhancement. We present a detailed overview of the
diffusion process that is based on a stochastic differential equation and delve
into an extensive theoretical examination of its implications. Opposed to usual
conditional generation tasks, we do not start the reverse process from pure
Gaussian noise but from a mixture of noisy speech and Gaussian noise. This
matches our forward process which moves from clean speech to noisy speech by
including a drift term. We show that this procedure enables using only 30
diffusion steps to generate high-quality clean speech estimates. By adapting
the network architecture, we are able to significantly improve the speech
enhancement performance, indicating that the network, rather than the
formalism, was the main limitation of our original approach. In an extensive
cross-dataset evaluation, we show that the improved method can compete with
recent discriminative models and achieves better generalization when evaluating
on a different corpus than used for training. We complement the results with an
instrumental evaluation using real-world noisy recordings and a listening
experiment, in which our proposed method is rated best. Examining different
sampler configurations for solving the reverse process allows us to balance the
performance and computational speed of the proposed method. Moreover, we show
that the proposed method is also suitable for dereverberation and thus not
limited to additive background noise removal. Code and audio examples are
available online, see https://github.com/sp-uhh/sgmseComment: Accepted versio
EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data
Speech emotion conversion is the task of converting the expressed emotion of
a spoken utterance to a target emotion while preserving the lexical content and
speaker identity. While most existing works in speech emotion conversion rely
on acted-out datasets and parallel data samples, in this work we specifically
focus on more challenging in-the-wild scenarios and do not rely on parallel
data. To this end, we propose a diffusion-based generative model for speech
emotion conversion, the EmoConv-Diff, that is trained to reconstruct an input
utterance while also conditioning on its emotion. Subsequently, at inference, a
target emotion embedding is employed to convert the emotion of the input
utterance to the given target emotion. As opposed to performing emotion
conversion on categorical representations, we use a continuous arousal
dimension to represent emotions while also achieving intensity control. We
validate the proposed methodology on a large in-the-wild dataset, the
MSP-Podcast v1.10. Our results show that the proposed diffusion model is indeed
capable of synthesizing speech with a controllable target emotion. Crucially,
the proposed approach shows improved performance along the extreme values of
arousal and thereby addresses a common challenge in the speech emotion
conversion literature.Comment: Submitted to ICASSP 202
Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain
Score-based generative models (SGMs) have recently shown impressive results
for difficult generative tasks such as the unconditional and conditional
generation of natural images and audio signals. In this work, we extend these
models to the complex short-time Fourier transform (STFT) domain, proposing a
novel training task for speech enhancement using a complex-valued deep neural
network. We derive this training task within the formalism of stochastic
differential equations, thereby enabling the use of predictor-corrector
samplers. We provide alternative formulations inspired by previous publications
on using SGMs for speech enhancement, avoiding the need for any prior
assumptions on the noise distribution and making the training task purely
generative which, as we show, results in improved enhancement performance.Comment: Submitted to INTERSPEECH 202
BEAR reveals that increased fidelity variants can successfully reduce the mismatch tolerance of adenine but not cytosine base editors
Base editors allow for precision engineering of the genome. Here, the authors present BEAR, a plasmid-based fluorescence assay for the measurement of CBE and ABE activity, to reveal the mechanism underlying their differences and to increase the yield of edited cells with reduced indel background
- …